Goto

Collaborating Authors

 equilibrium policy



Solving Continuous Mean Field Games: Deep Reinforcement Learning for Non-Stationary Dynamics

Magnino, Lorenzo, Shao, Kai, Wu, Zida, Shen, Jiacheng, Laurière, Mathieu

arXiv.org Artificial Intelligence

Mean field games (MFGs) have emerged as a powerful framework for modeling interactions in large-scale multi-agent systems. Despite recent advancements in reinforcement learning (RL) for MFGs, existing methods are typically limited to finite spaces or stationary models, hindering their applicability to real-world problems. This paper introduces a novel deep reinforcement learning (DRL) algorithm specifically designed for non-stationary continuous MFGs. The proposed approach builds upon a Fictitious Play (FP) methodology, leveraging DRL for best-response computation and supervised learning for average policy representation. Furthermore, it learns a representation of the time-dependent population distribution using a Conditional Normalizing Flow. To validate the effectiveness of our method, we evaluate it on three different examples of increasing complexity. By addressing critical limitations in scalability and density approximation, this work represents a significant advancement in applying DRL techniques to complex MFG problems, bringing the field closer to real-world multi-agent systems.



DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy

Xu, Kaixuan, Chai, Jiajun, Li, Sicheng, Fu, Yuqian, Zhu, Yuanheng, Zhao, Dongbin

arXiv.org Artificial Intelligence

Diplomacy is a complex multiplayer game that requires both cooperation and competition, posing significant challenges for AI systems. Traditional methods rely on equilibrium search to generate extensive game data for training, which demands substantial computational resources. Large Language Models (LLMs) offer a promising alternative, leveraging pre-trained knowledge to achieve strong performance with relatively small-scale fine-tuning. However, applying LLMs to Diplomacy remains challenging due to the exponential growth of possible action combinations and the intricate strategic interactions among players. To address this challenge, we propose DipLLM, a fine-tuned LLM-based agent that learns equilibrium policies for Diplomacy. DipLLM employs an autoregressive factorization framework to simplify the complex task of multi-unit action assignment into a sequence of unit-level decisions. By defining an equilibrium policy within this framework as the learning objective, we fine-tune the model using only 1.5% of the data required by the state-of-the-art Cicero model, surpassing its performance. Our results demonstrate the potential of fine-tuned LLMs for tackling complex strategic decision-making in multiplayer games.


Online Competitive Information Gathering for Partially Observable Trajectory Games

Krusniak, Mel, Xu, Hang, Palermo, Parker, Laine, Forrest

arXiv.org Artificial Intelligence

Game-theoretic agents must make plans that optimally gather information about their opponents. These problems are modeled by partially observable stochastic games (POSGs), but planning in fully continuous POSGs is intractable without heavy offline computation or assumptions on the order of belief maintained by each player. We formulate a finite history/horizon refinement of POSGs which admits competitive information gathering behavior in trajectory space, and through a series of approximations, we present an online method for computing rational trajectory plans in these games which leverages particle-based estimations of the joint state space and performs stochastic gradient play. We also provide the necessary adjustments required to deploy this method on individual agents. The method is tested in continuous pursuit-evasion and warehouse-pickup scenarios (alongside extensions to $N > 2$ players and to more complex environments with visual and physical obstacles), demonstrating evidence of active information gathering and outperforming passive competitors.


Distributed Offloading in Multi-Access Edge Computing Systems: A Mean-Field Perspective

Aggarwal, Shubham, Zaman, Muhammad Aneeq uz, Bastopcu, Melih, Ulukus, Sennur, Başar, Tamer

arXiv.org Artificial Intelligence

Multi-access edge computing (MEC) technology is a promising solution to assist power-constrained IoT devices by providing additional computing resources for time-sensitive tasks. In this paper, we consider the problem of optimal task offloading in MEC systems with due consideration of the timeliness and scalability issues under two scenarios of equitable and priority access to the edge server (ES). In the first scenario, we consider a MEC system consisting of $N$ devices assisted by one ES, where the devices can split task execution between a local processor and the ES, with equitable access to the ES. In the second scenario, we consider a MEC system consisting of one primary user, $N$ secondary users and one ES. The primary user has priority access to the ES while the secondary users have equitable access to the ES amongst themselves. In both scenarios, due to the power consumption associated with utilizing the local resource and task offloading, the devices must optimize their actions. Additionally, since the ES is a shared resource, other users' offloading activity serves to increase latency incurred by each user. We thus model both scenarios using a non-cooperative game framework. However, the presence of a large number of users makes it nearly impossible to compute the equilibrium offloading policies for each user, which would require a significant information exchange overhead between users. Thus, to alleviate such scalability issues, we invoke the paradigm of mean-field games to compute approximate Nash equilibrium policies for each user using their local information, and further study the trade-offs between increasing information freshness and reducing power consumption for each user. Using numerical evaluations, we show that our approach can recover the offloading trends displayed under centralized solutions, and provide additional insights into the results obtained.


Contextual Bandits for Evaluating and Improving Inventory Control Policies

Foster, Dean, Jia, Randy, Madeka, Dhruv

arXiv.org Machine Learning

Solutions to address the periodic review inventory control problem with nonstationary random demand, lost sales, and stochastic vendor lead times typically involve making strong assumptions on the dynamics for either approximation or simulation, and applying methods such as optimization, dynamic programming, or reinforcement learning. Therefore, it is important to analyze and evaluate any inventory control policy, in particular to see if there is room for improvement. We introduce the concept of an equilibrium policy, a desirable property of a policy that intuitively means that, in hindsight, changing only a small fraction of actions does not result in materially more reward. We provide a light-weight contextual bandit-based algorithm to evaluate and occasionally tweak policies, and show that this method achieves favorable guarantees, both theoretically and in empirical studies.


Multi-agent Attention Actor-Critic Algorithm for Load Balancing in Cellular Networks

Kang, Jikun, Wu, Di, Wang, Ju, Hossain, Ekram, Liu, Xue, Dudek, Gregory

arXiv.org Artificial Intelligence

T o address this problem, BSs can work collaboratively to deliver a smooth migration (or handoff) and satisfy the UEs' service requirements. This paper formulates the load balancing problem as a Markov game and proposes a Robust Multi-agent Attention Actor-Critic (Robust-MA3C) algorithm that can facilitate collaboration among the BSs (i.e., agents). In particular, to solve the Markov game and find a Nash equilibrium policy, we embrace the idea of adopting a nature agent to model the system uncertainty. Moreover, we utilize the self-attention mechanism, which encourages high-performance BSs to assist low-performance BSs. In addition, we consider two types of schemes, which can facilitate load balancing for both active UEs and idle UEs. We carry out extensive evaluations by simulations, and simulation results illustrate that, compared to the state-of-the-art MARL methods, Robust-MA3C scheme can improve the overall performance by up to 45%.


Learning Individual Policies in Large Multi-agent Systems through Local Variance Minimization

Verma, Tanvi, Varakantham, Pradeep

arXiv.org Artificial Intelligence

In multi-agent systems with large number of agents, typically the contribution of each agent to the value of other agents is minimal (e.g., aggregation systems such as Uber, Deliveroo). In this paper, we consider such multi-agent systems where each agent is self-interested and takes a sequence of decisions and represent them as a Stochastic Non-atomic Congestion Game (SNCG). We derive key properties for equilibrium solutions in SNCG model with non-atomic and also nearly non-atomic agents. With those key equilibrium properties, we provide a novel Multi-Agent Reinforcement Learning (MARL) mechanism that minimizes variance across values of agents in the same state. To demonstrate the utility of this new mechanism, we provide detailed results on a real-world taxi dataset and also a generic simulator for aggregation systems. We show that our approach reduces the variance in revenues earned by taxi drivers, while still providing higher joint revenues than leading approaches.


Learning in Discounted-cost and Average-cost Mean-field Games

Anahtarcı, Berkay, Karıksız, Can Deha, Saldi, Naci

arXiv.org Artificial Intelligence

We consider learning approximate Nash equilibria for discrete-time mean-field games with nonlinear stochastic state dynamics subject to both average and discounted costs. To this end, we introduce a mean-field equilibrium (MFE) operator, whose fixed point is a mean-field equilibrium (i.e. equilibrium in the infinite population limit). We first prove that this operator is a contraction, and propose a learning algorithm to compute an approximate mean-field equilibrium by approximating the MFE operator with a random one. Moreover, using the contraction property of the MFE operator, we establish the error analysis of the proposed learning algorithm. We then show that the learned mean-field equilibrium constitutes an approximate Nash equilibrium for finite-agent games.